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ABSTRACT 

The CGG repeats are present in the 5-untranslated 
region (5 -UTR) of the fragile X mental retardation 
gene FMR1 and are associated with two diseases: 
fragile X-associated tremor ataxia syndrome 
(FXTAS) and fragile X syndrome (FXS). FXTAS 
occurs when the number of repeats is 55-200 and 
FXS develops when the number exceeds 200. FXTAS 
is an RNA-mediated disease in which the expanded 
CGG tracts form stable structures and sequester 
important RNA binding proteins. We obtained and 
analysed three crystal structures of double-helical 
CGG repeats involving unmodified and 8-Br 
modified guanosine residues. Despite the presence 
of the non-canonical base pairs, the helices retain 
an A-form. In the G-G pairs one guanosine is always 
in the syn conformation, the other is anti. There are 
two hydrogen bonds between the Watson-Crick 
edge of G(anti) and the Hoogsteen edge of G(syn): 
06-N1H and N7-N2H. The G(syn)-G{anti) pair shows 
affinity for binding ions in the major groove. G(syn) 
causes local unwinding of the helix, compensated 
elsewhere along the duplex. CGG helical structures 
appear relatively stable compared with CAG and 
CUG tracts. This could be an important factor in 
the RNA's ligand binding affinity and specificity. 

INTRODUCTION 

Tandem repeats of the CGG trinucleotide motif are 
abundant in human genome and occur in numerous 
genes and transcripts (1). The repeat tracts are often poly- 
morphic in length in human population and may play a 
regulatory role in gene expression. The CGG repeats 
present in the 5'-untranslated region (5'-UTR) of the 
fragile X mental retardation gene FMR1 are associated 



with several distinct phenotypes (2). In normal population 
the number of CGG repeats varies in the range 5-54 (2,3); 
45-54 repeats fall into a subclass named 'the grey zone' 
associated with an increased likelihood of inter- 
generational pathogenic expansions (2,4). Tracts of 
55-200 CGGs are premutations that can cause progressive 
neurodegenerative disorder fragile X-associated tremor 
ataxia syndrome (FXTAS) in elderly males (5,6). Female 
premutation carriers are at risk of developing premature 
ovarian failure (7). More than 200 CGG repeats are full 
mutations resulting in fragile X syndrome (FXS), the 
most common inherited mental retardation syndrome 
in man (8). 

FXTAS is an RNA-mediated disease in which the level 
of FMR1 mRNA is significantly elevated (9,10). The 
expanded tracts form stable structures and sequester im- 
portant RNA binding proteins which are normally 
required for splicing and other cellular processes (11,12). 
The protein sequestration results in the formation of 
intranuclear inclusions in neurons and astrocytes (12,13) 
and triggers a dynamic formation of aggregates and de- 
regulation of alternative splicing of a number of genes in 
model cellular systems (11). In addition, the presence of 
stable CGG structures inhibits FMR1 translation at the 
initiation step, resulting in a deficiency of the encoded 
FMRP protein (9,14,15). FMRP is an important RNA 
binding protein involved in mRNA trafficking between 
the cell nucleus and the cytoplasm and in regulating trans- 
lation at the synapse (16). 

The structure of the CGG repeats in FMR1 transcripts 
is a hairpin whose stem is formed by alternating C-G, 
G-C and the non-canonical G-G base pairs (17). 
Isolated (CGG)2o repeats are thermodynamically the 
most stable hairpins of all the (CNG) 2 o (N stands for C 
or G or A or U) repeats (18,19) which means that G-G 
pairs are the strongest of all homobasic interactions. 
According to an NMR study (20), the opposing G-G 
bases are highly dynamic in CGG repeat hairpin, having 
one G residue in anti and the other in syn conformation. 
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Short CGG repeats were shown to form a duplex structure 
(20,21) and in the presence of potassium ions — 
G-tetraplexes (22-24). Thus, an RNA structure formed 
by CGG repeats is less clearly defined than in the case 
of CUG repeats (18,19,25) and CAG repeats (19,26-28) 
for which crystal structures have been determined (29-31). 

In this study, we report three crystal structures of 
double-helical CGG repeats containing native and 8-Br 
modified guanosine residues. 

MATERIALS AND METHODS 

Synthesis, purification and crystallization of 
oligoribonucleotides 

GCGGCGGC, GC(8-BrG)GCGGC and GC(8-BrG)GC 
GGCGGC oligomers were synthesized on an Applied 
Biosystems DNA/RNA synthesizer, using cyanoethyl 
phosphoramidite chemistry. Commercially available C 
and G phosphoramidites with 2'-0-tetrbutyldimethylsilyl 
were used for the synthesis of RNA (Glen Research, Azco, 
Proligo). The phosphoramidite 8-Br guanosine was 
synthesized according to Proctor et al. (32). The details 
of deprotection and purification of oligoribonucleotides 
were described previously (33). 

All crystals were grown by the hanging drop/vapour 
diffusion method at 19°C. A single crystal of 
GC(8-BrG)GCGGCGGC grew in 10 months. The reser- 
voir initially contained lOmM MgCl 2 50 mM Na cacody- 
late, pH 6.0 and 1.0 M Li 2 S0 4 . The crystallization drop 
initially contained 3ul of RNA at lOmg/ml and 1 mM 
MgCl 2 , and 3 ul of the reservoir solution. Crystals of 
GCGGCGGC grew in several days from the same 
solution as above, but involved crystal seeding and the 
starting RNA concentration of 2.4mg/ml. The crystals 
grew as clusters of small needles which were then used 
for seeding. The seeded crystals grew from a similar 
solution but with the RNA concentration of 1.2mg/ml. 
They appeared as single blocks but were in fact clusters 
of crystals that had to be separated. Crystals of 
GC(8-BrG)GCGGC grew in 2 months. The reservoir ini- 
tially contained lOmM CaCl 2 , 0.2 M NH 4 C1, 50 mM Tris- 
HC1 at pH 8.5 and 30% w/v PEG 4000. The crystalliza- 
tion drop initially contained 2 ul of RNA at lOmg/ml and 
1 mM MgCl 2 , and 2 ul of the reservoir solution. 

X-ray data collection, structure solution and refinement 

X-ray diffraction data were collected at 100 K: from GCG 
GCGGC on BL 14.2 beam line at the BESSY synchrotron 
(Berlin) to the resolution of 2.05 A; from GC(8-BrG)GCG 
GCGGC and GC(8-BrG)GCGGC on EMBL XI 1, 
DESY, Hamburg, to the resolution of 1.45 A and 
0.97 A, respectively. The crystals were cryoprotected by 
20% glycerol (v/v) in the mother liquor. The data were 
integrated and scaled using the program suite DENZO/ 
SCALEPACK (34). 

The structure of GC(8-BrG)GCGGC was solved first. 
SHELXD was used to identify the positions of the Br 
atoms by analysing the Patterson function based on the 
anomalous signal (35). The program identified two sites at 
least five times higher than any other peaks. SHELXE was 



used to identify the correct enantiomorph and to calculate 
initial phases (35), the calculated electron density map was 
uninterpretable in terms of an atomic model, but showed 
parallel columns of density indicating stacked RNA 
duplexes. DM was used for density modification but the 
resulting maps were still uninterpretable (36). The phases 
from DM were used in a free-atom phase refinement with 
'shaking' of the model, using ARP/wARP (37). In a total 
of 100 cycles of adding/removing atoms with nine round 
of shaking interspersed, the R-factor/R-free statistics ini- 
tially remained apparently random but after 25 cycles 
started dropping and in the final 50 cycles collapsed to 
0.188/0.256. The free-atom electron density map showed 
the RNA atoms clearly resolved and a well-defined solvent 
structure; the map looked very similar to the final map for 
the refined model. Both GC(8-BrG)GCGGCGGC and 
GCGGCGGC structures were solved by molecular re- 
placement using PHASER (38). The initial models were 
poor but sufficient for the purpose of refinement and 
model extension. The manual rebuilding and map inspec- 
tion were done using Coot (39). 

All three structures were refined using Refmac5 (40) and 
Phenix (41). The final model of [GC(8-BrG)GCGGC] 2 
was refined without restraints and with anisotropic tem- 
perature factors. The last few cycles of the [GC(8-BrG)GC 
GGC] 2 refinement were performed using all data, 
including the Rf ree set. The other two models, GCGGC 
GGC and GC(8-BrG)GCGGCGGC, were refined using 
isotropic B-factors. 

Helical parameters were calculated using 3DNA (42). 
Sequence-independent measures were used based on 
vectors connecting the CI' atoms of the paired residues, 
to avoid computational artefacts arising from 
non-canonical base pairing. Program PBEQ-Solver (43) 
was used to calculate electrostatic potential map. All 
pictures were drawn using PyMOL v0.99rc6 (44). The 
coordinates of the crystallographic models have been 
deposited with the Protein Data Bank (PDB) with acces- 
sion codes 3R1C, 3R1D and 3R1E. 



RESULTS AND DISCUSSION 

The overall structures 

The RNA in all three crystal structures has the form of 
duplexes stacking end-to-end and forming bundles of 
parallel columns. The crystals of the native RNA 
contain 18 (GCGGCGGC) 2 duplexes in the PI unit cell. 
Each column consists of all the 18 independent 
duplexes stacked consecutively (Supplementary 
Figure 1). In the atomic resolution C2 structure, there is 
one [GC(8-BrG)GCGGC] 2 duplex in the asymmetric unit. 
The other monoclinic crystal contains five crystallograph- 
ically independent RNA strands. They form three 
(GC(8-BrG)GCGGCGGC] 2 duplexes, the third consisting 
of two symmetry-related strands. The native RNA 
and [GC(8-BrG)GCGGCGGC] 2 structures contain 
sulphate ions from the crystallization medium, while 
[GC(8-BrG)GCGGC] 2 crystals have Ca 2+ . All the ions 
interact in the major groove with the G-G pairs (details 
below). The final models are summarized in Table 1. 
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Table 1. Summary of the X-ray data and model refinement for (GCGGCGGC) 2 , [GC(8-BrG)GCGGCGGC] 2 and [GC(8-BrG)GCGGC] 2 
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c Rf IBS was calculated using 5% of the total reflections chosen randomly and omitted from the refinement. 



Non-canonical G-G pairing and its consequences on the 
duplex 

Despite the presence of the non-canonical base pairs, the 
helices have a typical A-form, with values of helical twist 
in the range 30-32 for all the duplexes, the sugar pucker 
C3' '-endo or, o in some cases, C2'-exo and Zp values of 
2.63 ± 0.22 A (for the A-form it should be more than 
1.5 A). The helices contain C-G and G-C base pairs 
with typical Watson-Crick interactions. Between them 
are the non-canonical G-G pairs in which one guanosine 
is always in the syn conformation while the other is anti 
(Figure 1), experimentally shown to be the preferred ar- 
rangement in double helical context (45^17). The syn-anti 
geometry of this base pair can be described as G/G cis 
Watson-Crick/Hoogsteen according to the nomenclature 
proposed by Leontis and Westhof (48). There are two 
hydrogen bonds between the guanosine residues: 
carbonyl oxygen is bonded to N1H, and N7 to the exo- 
amino group. All the H-bond distances are in the range 
2.6-3.3 A. The conformation of the G(syn) residue is add- 
itionally stabilized by a hydrogen bond between the 
exo-amino function and its phosphate oxygen atom 
(3.0 A). 

The syn-anti arrangement avoids the steric clash 
between the two bulky guanines within the helical struc- 
ture. This is evident from the Cl'-Cl' distances between 
the paired residues: 1 1.3 ± 0.1 A for G-G, compared with 
10.7 ± 0.2 A for the canonical C-G pairs. The angle k of 
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Figure 1. Non-canonical G(syn)-G(anti) pair. Hydrogen bonds are 
drawn with dashed lines. Solid line connecting CI' atoms gives a 
measure of strand separation. Angles X are marked (see text). 



the glycosidic bond with the line connecting the CI' atoms 
of each pair is 33 ± 4° for the G(syn) residues and 64 ± 3° 
for G(anti), compared with 54 ± 3° for the other residues. 
This means that the G(syn) is shifted towards the minor 
groove and the G(anti) towards to major groove, which 
optimizes the H-bonding interactions between the 
Hoogsteen and Watson-Crick edges, while avoiding the 
clash between the carbonyl oxygen atoms (Figure 1). 

The G(syn) residues also show unusual a and y 
backbone torsion angles. The a angle, representing a 
rotation about the P-05' bond, is +ac, +ap, or — ap (in 
one case) instead of the typical —sc. The angles range 
107-182° with the average value of 142°, almost half a 
turn from the mean —sc value of —60°. The j angle, 
about the C5'-C4' bond, is +ap or — ap in the G(syn) 



Nucleic Acids Research, 2011, Vol. 39, No. 16 7311 




C5' - 05' bond 

Figure 2. The G(syn) residue in a helical context is nearly co-planar with 
the adjacent cytosine (pink). The torsion angles a, y (cyan) are flipped 
with respect to typical values for A-form and the helix is locally unwound. 



residues, ranging 152° to —152°. This is ~120° from the 
usual +sc. This amounts to flipping of the 05'-C5' bond 
and a local 'straightening 1 of the sugar-phosphate 
backbone (Figure 2). The unusual torsion of the 
backbone is similar to the 'extended' conformation 
observed by Haran et al. (49) in a CG step of a DNA 
helix. In the present study this conformation seems to be 
necessary for the simultaneous (i) Hoogsteen-Watson- 
Crick pairing of the guanosines; (ii) the H-bond between 
exo-amino group of G(syn) with the phosphate O; and 
(iii) stacking against the neighbouring cytosine. These 
effects explain the accommodation of the G-G pair 
within the helix. 

The local straightening of the sugar-phosphate 
backbone, amounting to a local unwinding of the helix, 
is compensated elsewhere along the duplex, and the 
overall statistics do not deviate from typical 
(Supplementary Tables 1-4). This is different from the 
case of CAG repeats (31) where a similar inversion of 
the a and y angles is associated with the overall unwinding 
of the helix and broadening of the major groove to >20A 
(measured as the distance between lines connecting P 
atoms). For the native CGG-containing duplexes, the 
average width of the major groove was 17.9 ± 0.9 A, for 
the longer Br-modified duplex it was o 17.8 ± 2.5 A and for 
the shorter modified structure 14.3 A. The values for the 
minor groove were 15.8 ± 0.5 A, 15.4 ± 0.5 A and 16.1 A, 
respectively. The values are not out of the ordinary for the 
A form. 

Another effect of the G(syn) conformation is that the 
guanine has no stacking interaction with the downstream 
G-C pair and it stacks against the preceding pair of 
residues (Figure 3). 

The effect of bromination and the distribution of G-G 
conformers 

All the 8-Br modified guanosine residues are in the syn 
conformation, therefore the pairs are 8-BrG(.?_p«)- 
G(anti). As observed before, duplexes containing the 
modified residues are more stable than the 



corresponding native duplexes. The melting temperature 
of [GC(8-BrG)GCGGC] 2 is 13°C higher than for (GCGG 
CGGQ2 (21). In structural terms this is probably due to a 
restricting effect of Br on the conformational freedom and 
excluding the unfavourable G{anti)-G(anti) interactions. 
The crystallographic structures bear this out in the sense 
that the modified duplexes contain only well ordered 
8-BrG(.vj'«)-G(fl77^') pairs, while in the native duplexes 
each G-G pair is observed in one of three possible ar- 
rangements: G(syn)-G(anti), G{anti)-G(syri) or a statically 
disordered mixture of the two (in 2 out of 36 pairs). The 
three base-pairing arrangements occur with different 
frequencies and in various combinations of pairs along 
the native duplex. Symmetric arrangements are clearly 
favoured: anti-syn followed by syn-anti or vice versa (in 
14 out of 18 cases). There is a slight preference, which 
can be fortuitous, for the former arrangement (8 cases as 
opposed to 6). Of the remaining four duplexes, two are 
clearly asymmetric and two contains statically disordered 
G-G pairs (Supplementary Table 5). In the longer 
modified duplexes, containing unmodified G-G pair in 
the middle, two of the native G-G pairs are disordered, 
showing both possible conformations; the third is ordered. 

Apart from restricting the conformational freedom, and 
thus defining the conformation of the G-G pair, there is 
little crystallographic evidence that bromination alters the 
structure compared with the native RNA. The native and 
brominated duplexes can be superposed with an r.m.s. 
deviation of ~1A. The similarity is also reflected in 
helical parameters. In terms of interactions with the 
solvent, the Br atom seems to displace a water molecule, 
which in native G{syn) is located ~3.2A from C8, in the 
minor groove, but its main effect appears to be steric. In 
terms of H-bonding capacity, bromination alters the pKa 
of guanosine from 9.3 to 8.4 (50), but this is likely to be 
insignificant. 

Solvent interactions and hydration 

The exposed Watson-Crick edges of G{syn) residues 
interact with sulphate anions. In some instances in the 
native RNA structure and in the longer modified duplex, 
the sulphate appears well ordered, but in most cases its 
orientation is poorly defined. The anions can be distin- 
guished from water molecules by the size and shape of 
the electron density and the interaction distance from 
the RNA (Figure 4b). In the high resolution structure, 
where there was no sulphate in the crystallization 
medium, an inner complex is observed involving the 
carbonyl oxygen atom with a hydrated calcium cation 
(Figure 4a). The conditions in the crystallization 
medium are far from physiological, nevertheless the 
observed complexes indicate a potential of the 
solvent-exposed Hoogsteen-Watson-Crick edge for at- 
tracting charged species, especially as 110 interactions 
with ions are observed elsewhere in the structures. In the 
absence of sulphate, the G(syn) Watson-Crick edge is 
hydrated by three water molecules that form a crest 
co-planar with the guanine. In the absence of Ca 2+ , the 
G(anti) H-edge is hydrated by two- or three-ordered water 
molecules. 
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Figure 3. Stacking interactions in the CGG duplex structure. (A and B)The non-canonical G-G pairs (aquamarine); (C) the canonical CG/GC step. 




Figure 4. The hydration of G(syn)-G(anti) pairs in the 
GC(8-BrG)GCGGC (A) and in the GC(8-BrG)GCGGCGGC (B) 
structures. Ca 2+ (green) is bound directly to the carbonyl oxygen 
atom of G(anti); a sulphate anion interacts with the WC edge of 
G(syn). The 2Fo-Fc electron density map is contoured at la level. 



Interestingly, in all the examined instances of the struc- 
ture which have sulphate anions in the major groove, the 
width of the grooves are similar and close to 18 A (see 
above), as opposed to the Ca 2+ -containing structure, 



in which the groove is narrower by nearly 4 A. It is 
possible that the presence of the anions stabilizes the 
width of the major groove. 

Electrostatic surface potential 

The electrostatic surface potential shows the already 
familiar pattern of alternating stripes of positive and 
negative potential in the minor groove, similar to the pre- 
viously observed distribution in CUG and CAG repeats 
(Figure 5). The pattern is due primarily to the C-G and 
G-C pairs rather than to the interposed N-N pairs. The 
major groove in the CGG structures is mostly electronega- 
tive with positive areas generated each by the Watson- 
Crick edge of G(syri) and the exo-amino group of the 
preceding cytidine residue. The binding of the sulphate 
ions corresponds very closely with the electropositive 
features associated with G(syn) residues and their 
calculated surface potential indeed appears higher than 
for the adjacent cytosines, which have not attracted any 
ions. The difference in binding potential for ligands can be 
explained by the exposed Watson-Crick edge of the 
guanine in this position. In addition, the guanines have 
stacking interactions only on one side, while the cytosines 
are engaged on both their surfaces. 

It is harder to explain, just by examining the electrostat- 
ic potential, what distinguishes CGG from CUG or CAG, 
and why CGG is not recognized by MBNL1 protein 
(51-53). An answer could lie in the stability of the CGG 
duplex. The non-canonical G-G pair contains two 
H-bonds, as opposed to a single bond in U-U and a 
weak C-H---N bond in A-A. Therefore the CGG 
tracts could be more stable in the duplex form and less 
accessible to the protein which appears to bind 
single-stranded RNA (54). This is consistent with thermo- 
dynamic parameters: AG of duplexes containing CGG is 
markedly higher than for the other three repeats, which 
are similar (18,19,21). 
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Figure 5. The electrostatic surface potential of two consecutive 
duplexes of the [GC(8-BrG)GCGGCGGC] 2 structure. Red is 
negative, blue is positive. Sulphate anions (sticks) are shown interacting 
in the major groove. 



CONCLUSIONS 

This is a third in a series of studies aimed at profiling CNG 
structures at crystallographic resolution to facilitate drug 
design and provide 3D templates for rationalizing bio- 
chemical and cytological observations. The need for 
detailed RNA structures is clearly signalled in the litera- 
ture on rational design of CNG-binding ligands (55-57). 

The crystal structures presented here of unmodified and 
modified CGG repeats are consistent and allow a general 
description of double-stranded CGG tracts and a com- 
parison with CUG and CAG structures reported previous- 
ly (29,31)- The foremost common feature is that all the 
known CNG structures form A-helices stabilized by 
C-G and G-C pairs acting as sturdy struts. The variety 
is provided in between by the non-canonical pairs. 



The 'accommodation problem' is solved differently for 
each N-N pair, but in every case the disruption of the 
helix is surprisingly small. The bulky guanines fit within 
the helical constraints by a 180° flip about the glycosidic 
bond of one of the bases. The equally bulky adenines 
retain the awn-conformation, but are shifted out of the 
helical axis, towards the major groove. In the one 
known example of paired pyrimidines, the relatively 
small uracil rings remain some distance apart, making 
only one direct hydrogen bond instead of two bonds 
which they can make in different environments. 

Despite the general resilience of the A-form, character- 
istic differences can be observed between the CNG duplex 
structures. The parameters that seem to be specially sen- 
sitive are helical twist and major groove width. In CGG 
helices, we observe local unwinding of the helix around the 
G-G pairs, as opposed to the more general unwinding in 
the case of CAG structures. The major groove width may 
depend on the nature of the ligand which occupies it, 
bound to the G-G pair. This work provides examples of 
bound calcium and sulphate ions; another example 
of sulphate interacting with G-G pairs and possible affect- 
ing the groove width is provided by Adamiak and col- 
leagues (58). 

Thermodynamic stability is an important property that 
is difficult to investigate in crystallography, but some 
measure is provided by the nature and the count of 
hydrogen bonds and the extent of stacking interactions. 
In this respect, CGG helical structures appear relatively 
stable compared with CAG and CUG tracts — in agree- 
ment with calorimetric studies (19,21). This could be an 
important factor in the RNA's ligand binding affinity and 
specificity. 

So far, the three known CNG repeats have been 
observed to bind small ligands in the major groove. 
Glycerol, sulphate anion and Ca 2+ cations were found to 
be associated with N-N pairs of the CNG duplexes. The 
interactions depend on hydrogen bonds formed with the 
functional groups exposed in the major groove, character- 
istic of each N-N pair and to some extent on the electro- 
static charge distribution. These can be taken as the main 
indicators in designing specific ligands. 

A detailed comparison of CGG, CAG and CUG duplex 
structures is provided in Supplementary Table 6. 
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